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We define an optimal basis system into which cosmological observables can be decomposed. The basis system can be optimized for a specific 
cosmological model or for an ensemble of models, even if based on drastically different physical assumptions. The projection coefliicients 
derived from this basis system, the so-called features, provide a common parametrization to study and compare different cosmological models 
independently of their physical construction. They can be used to directly compare different cosmologies and study their degeneracies in terms 
of a simple metric separation. This is a very convenient approach, since only a very small number of realizations have to be computed in 
contrast to Markov Chain Monte Carlo methods. Finally, the proposed basis system can be applied to reconstruct the Hubble expansion rate 
from supernova luminosity distance data with the advantage of being sensitive to possible unexpected features in the data set. We test the 
method both on mock catalogues and on the SuperNova Legacy Survey data set. 
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1. Introduction 

In the last decade quantities such as the luminosity distance 
redshift relation of type-la supernovae (e.g. Riess et al. 



1998HPerlmutter et alJll999l:lAstier et alJl2006l:lKowalski et al 
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2008 ). the baryonic acoustic oscillation (e.g. ' Eisenstein et al 
20051) . the Cosmic Microwave Background (CMB) power spec- 
trum (e.g. de Bernardis et al. 2002; Komatsu et al. 20091). the 
cosmic shear correlation function (e.g. iBeniamin et alJ l2007t 
IFu et al. 2008). etc. have been object of intense observational 
efforts to build our picture of the universe. All of these data 
sets are usually interpreted and explained through a direct com- 
parison with a specific model, or a class of models as for ex- 
ample Friedmann cosmologies, which are inevitably based on 
simplifications and assumptions. A remarkable example is the 
equation of state parameter of dark energy, w, whose behaviour 
is still poorly understood. Thus, if the adopted model ignores 
unexpected features which may actually exist, the results may 
be largely misleading. Several authors highlighted the pitfalls 
that the weak dependence of the equation of state parameter 
on the actual observables produces on the po ssible c onclusions 
drawn on the dark-energ y properties (e.g. iMaor et al.. .2001 , 



2002l;lBassettetalJl2004 



A model-independent approach, instead, may not be af- 
fected by such limitations. The importance of a model- 
independent reconstruction of the cosmic expansion rate from 
luminosity distance data has been widely discussed in the past 



decade. The possibility of reconstructing the dark-energy po- 
tential from the expansion rate, H(a), or from the growth rate 
of lin ear density perturbations, 6{a), was first pointed out by 



Starobi nskv (1998), where the relations between the observa- 



tional data and the expansion rate are presented. Several dif- 
ferent techniques have been developed since then to appro- 
priately treat t he data in order to perform such a reconstruc- 
tion (see, e.g. iHtiterer & Turnei] |l999l l2000t iTegrna^ 12002: 
Wang & Tegmark"2005'), all of them employing a smoothing 



procedure in redshift bins. A recent reconstruction technique, 
which recovers th e expansion function f rom distance data, ha s 
been developed in lShafieloo et al. 1 (l2006l) and lShafielool (l2007h . 
making use of data smoothing over r edshift with Gaussi an ker- 
nels, and it has been generalised by lAlam et al.l (120081) to re- 
construct the growth rate from the estimated expansion rate. 
An al ternative method, proposed by Mignone & Bartelmann, 
(120081) (hereafter MB08) reconstructs the expansion rate di- 
rectly from the luminosity-distance data, expanding them into 
a basis system of orthonormal functions, thus avoiding bin- 
ning in redshift, and it has been extended in order to esti- 
mate the linear growth factor and to be applied to cosmic 
shear data (Mignone et al, in prep.). Also principal compo- 
nent analysis (PCA) has been used to reconstruct the dark- 
energy equation of state parameter as a function of red- 
shift ( see, e.g., Huterer & Sta rkman 2003; Hutere r & Cooray 
20051; ICrittenden & PogosianI 120051; lUnder & Hutereil I2OO5I; 
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Simpson & Bridld bOOa iHuterer & PeirisI 120071: iTang et al 
2008h . 

In addition to data interpretation, the last years also saw 
the proliferation of several cosmological models based on a 
very wide spectrum of physical assumptions, such as for ex- 
ample the existence of dark energy and dark matter, grav- 
ity beyond the standard general relativity framework or pecu- 
Uar large scales matter d istributions (for a recent review see 
Durrer & Maartensll2008b . It thus raised the issue of compar- 
ing all these models and to study their mutual degeneracies in 
an efficient way which is not straightforward because of their 
very different physical backgrounds; this is the case also when 
different dark energy prescriptions are adopted. 

In this paper, we make use of a principal component ap- 
proach to define a basis system capable of providing a parame- 
terization describing cosmologies independently of their back- 
ground physics and of allowing the detection of possible unex- 
pected features not foreseen by the adopted models. For the lat- 
ter point, this basis system is used to improve the MB08 method 
to derive the Hubble expansion rate from supernova luminosity 
distances through a direct inversion of the luminosity distance 
equation. Our principal component approach differs from those 
already proposed in literature because it aims at modelling ob- 
servables rather than underlying physical quantities such as w. 
This is done starting not from data but from theoretical mod- 
els, ensuring the derived basis to be optimized for the specific 
model. 

The structure of the paper is as follows. In Sect. (2) we 
discuss the optimal basis system's derivation and properties, in 
Sect. (3) we discuss how the projection on the defined basis 
system can be used as a new cosmological parameterization, 
in Sect. (4) we optimize the non parametric Hubble expansion 
reconstruction proposed by MB08. Finally we present our con- 
clusions in Sect. (5). 

2. An optimized basis system for cosmological 
data sets 

This section presents an applicati on of principal component 
analysis (e.g. iTegmark et al. Il997 ) to define a hnear transfor- 
mation to optimally describe cosmological observables. The 
transformation is defined such as to maximize the capability of 
discerning different cosmological models and to highlight the 
possible existence of unexpected features not foreseen when a 
specific model is adopted. We derive our approach having in 
mind the analysis of cosmological data sets, but its application 
is completely general. 

2.1. Principal components derivation 

Any data set can be represented by a vector d € R" whose di- 
mension n corresponds to the number of available data points, 
e.g. the number of available supernovae and/or CMB multi- 
poles in a given data set. The n-dimensional space containing 
all of these possible ^-vectors allows to address the problem 
directly through observable quantities regardless of their un- 
derling physics. We investigate this space by sampling it with 
a set of M vectors {i; e R" | /= 1, M) modelling the data 



(hereafter the training set) and which we organize in the matrix 



T = (tut2,.:,tM) 6 



for convenience. The vectors 



can have the same discrete structure of the analyzed data, which 
can be discrete and iiTegular. In principle, these models can be 
a set of arbitrary functions, and actually the choice is fully ar- 
bitrary, but it is convenient to consider models at least weakly 
resembling the data set. It is in fact pointless to sample the en- 
tire domain of behaviours when we at least know the main data 
properties. The choice of the training set only determines for 
which kind of models, or better behaviours of the observables, 
the principal components performance is optimal. This is the 
reason why we find convenient to use a set of theoretical mod- 
els. This choice do not preclude the method flexibility as it will 
be demostrated. 

Once the possible models which are spanned by data have 
been sampled, the extraction of the information they contain 
can be optimized via a linear transformation : R" — > R" 
mapping the training-set vectors into a space (hereafter /eafwre 
space) where their projections. 



with / = 1, ...M . 



(1) 



have the maximum scatter in very few components. We call 
feature vectors any vector resulting from the projection ex- 
pressed by Eq. ([TJ and features their components. This linear 
transformation, W = (wi,W2, ■■■,w„), is given by a set of n or- 
thonormal vectors {w; e R" | ; = 1, n) known as principal 
components. The principal components are found by solving 
the following eigenvalue problem 



Wi = Ai Swi , 



(2) 



and by sorting them in descending order /I, > Aj+i to ensure the 
largest feature separation in the very first components. Here 



5 = AA^ € 



(3) 



with A = (fi - i, t2 - t, tM-t) € R*^""', is the so-caUed 
scatter matrix, which encodes the differences (or scatter) be- 
tween each training vector ti, i.e. a given model, and the ref- 
erence vector i around which the scatter is maximized. The 
reference vector defines the origin of the feature space and is 
usually set as the average of the training set t = (t), but a differ- 
ent i can be used instead, depending on the specific problem at 
hand. An interesting choice could be the best fit to a given cos- 
mological model, so that all other models would be described 
as its perturbed states. 

Note that the principal components derived in this way con- 
stitute a full basis system for the training-set cosmologies only. 
However, they turn out to be very flexible and even able to 
reproduce behaviour not present in the training-set models as 
shown in Sec. ( I4.2l i. In this work, we choose to base our train- 
ing set on Friedmann ACDM cosmologies with different cos- 
mological parameters, but of course other kinds of cosmologi- 
cal models can be used as well, such as e.g. cosmologies with 
dynamical dark energy, based on modified gravity theories, or 
even a mixture of them so that they can be optimally described 
at the same time. 
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Fig. 1. The first two features, i.e. the first two components of the feature vectors t, of an ensamble of Friedmann cosmologies 
with h - 0.7, wde = -1, crs = 0.8, and the matter density and dark energy density ranging in the intervals 0.1 < Q,„ < 0.5 
and 0.5 < Qa < 0.9, respectively. Each black point is related to the coiTesponding / - th cosmological model by the mapping 
Ti = W^tj. The left panel refers to SNLS data alone and the right panel to WMAP-5yr data alone. The red points mark the feature 
space origin, i.e. the reference model t, and the red points with error bars show the projection of the two data sets. 



2.2. Principal components as an optimization problem 2.4. Combining different data sets 



The derivation of the principal components can be interpreted 
as a constrained optimization problem, where the subset of lin- 
ear orthonormal transformation W maximizing the separation 
between different cosmologies is sought. This is achieved by 
maximizing the functionals L, = w^Swj - Ar^iw^Wi - 1) with 
respect to h",-, i.e. by looking for the solution of 6Li/Sw^ i - 0. 
This leads to the eigenvalue problem expressed by Eq. (|2]l and 
consequently to the same principal components h",. With this 
approach, the first principal component can be seen as an op- 
timal matched filter which in thi s case operates directly on 



cosmological data sets (see, e.g. Seliak & Zaldarriagairi996t 



Maturi et al.ll2005l:ISchafer et al.ll2006l) . 



The approach described above allows a straightforward way 
to combine different observables for a joint data analysis. For 
example, if we want to combine the luminosity distances to 
SN la, the CMB angular power spectrum and the cosmic shear 
correlation function, we just need to define the data vector and 
training-set of the form 

d = [d,(zi) ... D,iz„J, Ci, ... C,„___^^_ , m) -m,)] , (5) 

whose dimension is given by the sum of all data vector sizes 
n - risn + ricmb + n^- In order to work with non-dimensional 
quantities which reflect the signal-to-noise ratios, the different 
observables have to be re-normalized with respect to their vari- 
ance. 



2.3. Speeding up computations 

If the training vector number is smaller than their dimension, 
i.e. M < n, only the first M principal components can be as- 
sociated to non- vanishing eigenvalues. Therefore, only those 
components need to be derived. This is achieved by computing 
the M eigenvectors w' e R*' of the matrix 



S' 



(4) 



These are related to the first M eigenvectors of the scatter ma- 
trix S by Wi - \w'j. The increase in computational speed is 
especially remarkable for large data sets, where M ^ n. 

In addition to this gain in computational speed, all the rele- 
vant information is, in most cases, constrained by a very small 
number of independent components, m < M (usually up to 
three for this kind of applications), allowing an even stronger 
dimensionality reduction. In other words, a full data descrip- 
tion is guaranteed by the subspace R™ sampled by the training 
set. 



3. A new cosmological parametrization 

Since the features r discussed in Sec. (|2]i retain all significant 
cosmological information, they can be used to parametrize cos- 
mologies. In contrast with the 'standard' cosmological parame- 
ters, they aim to describe observable quantities instead of phys- 
ical properties. 

To give a visual impression of how cosmologies are rep- 
resented in the feature space, we show two simple exam- 
ples in Fig. [T] for luminosity distances only (left panel) and 
CMB power spectra only (right panel). Here, we plot the first 
two components resulting from non-flat ACDM models, where 
only the matter and dark-energy densities are varied indepen- 
dently in the range 0.1 < < 0.5 and 0.5 < Qa < 0.9, re- 
spectively. The Hubble constant in units of lOOkm/s/Mpc, the 
equation of state parameter of dark-energy and the matter fluc- 
tuations power spectrum normalization were fixed to h - 0.7, 
w = -1, CTg = 0.8,respectively. Each point represents a ACDM 
cosmology with specific cosmological parameters. Note that, 
in the case of CMB data, the Q,„ - Qa feature space plane 
is curved such that at least three features would be necessary 
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Fig. 2. Example showing the stabihty of the principal components against the number of models used in the training-set. We 
show only the first 4 principal components derived for the luminosity distance sampled at the redshifts covered by the SuperNova 
Legacy Survey. The training-set was produced by sampling the parameter space in the range 0.1 < Q.,n < 0.5 and 0.5 < Q.a < 0.9, 
100 (left panel) and 5 times (right panel), respectively, and using as a reference cosmology the training-set average. 



for a satisfactory description of the most extreme cosmologies 
considered. This is because of the rich complexity of the data 
set. To cope with this, a non linear mapping could be used to 
'follow the distortion' of the models hyper-plane in the fea- 
ture space, but this would add unnecessary complications since 
the use of a larger number of features is not a limitation. In 
any case the actual CMB physical models have a large num- 
ber of parameters with large mutual degeneracies (for instance 
the optical depth, the baryon fraction, the inflation spectral in- 
dex, etc.), therefore the more complex models usually adopted 
are not necessarily described by a linearly growing number of 
features compensating the increase of necessary features. 

With this formalism, the principal components can be 
considered as cosmological eigen-modes {eigen-cosmologies) 
where observations would "excite" (i.e. make visible) a given 
number of modes according to their accuracy. 

3.1. The advantages 

The use of these orthonormal functions to define a parameter 
set characterizing observable behaviour instead of underlying 
physical quantities has several advantages. In fact: 



- the features are fully independent by definition and there- 
fore avoid any redundancy and degeneracy in the observ- 
able description, in contrast with physical parameteriza- 
tions; 

- they retain all available information because they are de- 
rived from the principal components; 

- their number is minimal as allowed by the data accuracy; 

- they provide the best discriminatory power for the family 
of cosmologies adopted in the training-set; 

- they can be related to any physical model via the mapping 
T; = W'^ti itself; 



- the features allow to quantify the overall difference between 
two cosmologies in terms of a simple metric separation 



where t\ and ti are the two cosmologies features vec- 
tors and To- is the data uncertainty projection in the feature 
space. 

These properties apply also to cosmologies which are not 
explicitly included in the training set; however, in this case 
not all model behaviours are ensured to be captured. In other 
words, if nature or the cosmological model we are investigat- 
ing differs from the one adopted in the principal component 
definition, we could still use them, even if in suboptimal con- 
ditions. In any case, it is possible to cope with this by making 
the training set less specialized. If we are for example studying 
cosmologies based on different physical frameworks such as 
General Relativity, TeVeS or f(R) theories or simply different 
dark-energy models, we could include all of them in the train- 
ing set so that the resulting features can optimally describe all 
of them at the same time. Given that, it follows how the fea- 
tures T can be used as a common parametrization to describe 
and compare cosmologies even if based on different physi- 
cal frameworks. Again, this is possible because this approach 
parametrizes observables only and not their very diverse back- 
ground physics. In this paper we only consider non-flat ACDM 
cosmologies for sake of simplicity. 

3.2. Studying different modellizations degeneracies 

The proposed parametrization, thanks to the properties dis- 
cussed in Sec. ( 13. Il l, provides a useful tool to study degenera- 
cies in the same or, more interestingly, in different modelliza- 
tions and physical frameworks. In fact, fully degenerate models 
show the same observational properties and consequently have 
the same features t. Of course, when considering observational 
data, degeneracies are not associated to a feature space point 
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but to the hyper-volume defined by the data errors projection. 
In fact also data errors have to be projected into the feature 
space to define the region compatible with the data as shown 
in Fig. ([Til. All information regarding how similar, i.e. degener- 
ate, two models are is quantified by the metric distance given 
in Eq. (|6]l which is in fact normalized with respect to the data 
accuracy. In fact, all models whose separation is smaller than 
the hyper-volume radius allowed by data are degenerate. 

In compariso n with Markov Chain Monte Carlo methods 
(see for example iLewis & Bridlell2002h . this approach is not 
an iterative method and is computationally cheap since a small 
number of models have to be computed. In fact, the parameter 
space can be sampled on a very coarse grid and, if necessary, 
according to the parameters conditional distribution in analogy 
with Gibbs sampling. In a follow up paper we will discuss a 
detailed study of this parametrization and of its appUcation in 
degeneracy studies. 

4. Hubble expansion rate from supernovae data 

Supernova luminosity distances are a very powerful probe to 
investigate cosmology. In particular, they can be used to di- 
rectly measure the expansion history of the universe, H{a), 
avoiding any reference to Friedmann models. In fact, if we 
assume a topologically simply connected, homogeneous and 
isotropic universe, the luminosity distance can be expressed as 



Dda) 



c r' dx 



ix). 



(7) 



where the expansion function is expressed as H(a) — HoE(a), 
with Hq being the Hubble expansion constant and e(a) = 
E^^{d) the inverse expansion rate. For the sake of simplicity, we 
drop the cIHq factor in the following discussion. As detailed in 
MB08, the derivative of Eq. O with respect to the scale factor 
a can be brought into the form of a Volterra integral equation 
of the second kind. 



e(a) — —a^D'Aa) + a 



rd. 



dx 

a' 



ix). 



(8) 



which can be dire ctly solved in terms of a Neumann series 
eia) = YZo dArfken & Webedll995b . A possible choice 

for the expansion terms e,- is 



r Ax 



(9) 



where it is necessary to smooth the observational data before 
the estimation of the derivative D'j{a) because of the intrinsic 
data scatter. A convenient way to do it is to expand the lumi- 
nosity distance data into a set of orthonormal functions 



(10) 



with the advantage of avoiding all assumptions regarding the 
energy content of the universe. The choice of the adopted ba- 
sis is arbitrary. For illustrative reasons, MB08 adopted the lin- 
early independent set Uj{a) - a^^^"' ortho-normalised with the 
Gram-Schmidt process. However, the basis {pj} can be defined 
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simulated data 
PCA - 1 coeff 
MB08 - 3 coeff 




5 0.6 0.7 0.8 0. 




Fig. 3. Fit of our supernovae luminosity distances mock cat- 
alogue resembling the SNLS data set (left panel) and the re- 
covered expansion rate obtained (right panel). We compare the 
results obtained by using the original MB08 recipe (shaded 
area) and by using our principal components as basis set (blue 
squares). The increased accuracy is especially evident at lower 
redshifts, where the method improvement takes fully advantage 
of the smaller measurement errors. 



such as to minimize the number of necessary modes and to 
have them ordered according to their information content. A 
good choice fulfilling these criteria is represented by the prin- 
cipal components defined in Sec. (|2]l which can be optimized 
for a specific cosmology or for a set of cosmological models 
based on different physical assumptions. This basis optimiza- 
tion enhances the method performances without precluding the 
method flexibility. In fact, also behaviour not described by the 
models adopted in the basis definition can be reproduced, as it 
will be shown in Sec. (I4.2l i. 

4.1. Principal components stability 

The stability of the principal components with respect to the 
number of models used in the training set has been tested. We 
show in Fig. |2] the first four principal components derived for 
a luminosity-distance data vector with the same redshift sam- 
pling of the SuperNova Legacy Survey ("Asti er et al.l2006l) . The 
training set is based on non-flat ACDM models with h - 0.7, 
w = -1 and the matter and dark-energy density parameters 
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sampling the ranges 0.1 < Q,„ < 0.5 and 0.5 < Oa < 0.9, re- 
spectively; as a reference cosmology, the average of the training 
set has been used. The (Q,„, Qa) space was regularly sampled 
by the training set 10, 000 times to produce the left panel and 
only 25 times in the right panel. Clearly, the principal com- 
ponents are very stable against the training-set size and only 
depend on the range spanned by the cosmological parameters 
of the training set. 

As discussed in Sec. (|2]i, the information content of each 
principal component is quantified by the corresponding eigen- 
value, which in this case are Ai - \, A2 - 2.010"'*, /I3 = 
1.4 10"^ and - 1.2 10 Hence, all information and dis- 
criminatory power is concentrated in the very first components 
allowing a strong dimensionality reduction, fromn = 117 (i.e. 
the number of supernovae in the data set) to 1 or 2 dimensions 
for this specific case. If the number of parameters sampled in 
the training-set construction is increased, or if the intervals over 
which they are sampled are larger, the power is distributed to- 
wards higher orders, but is still fairly concentrated in very few 
components. 

4.2. Application to synthetic data: highlighting 
unexpected features 

We apply the MB08 method combined with the principal com- 
ponents described in Sec. (|2]) on a synthetic data sample drawn 
from a ACDM model with Q.m = 0.3 = 0.7 and h 0.7, 
and resembling the SNLS properties (lAstier et al.l 120061) . The 
training set for the principal components definition is the same 
tested in Sec. (14. 11 1. We show in Fig. [3] the resulting fit to 
the data (left panel) and the subsequently estimated expansion 
rate (right panel) as compared with the original MB08 recipe 
(shaded area). The resulting error bars are smaller when the 
principal components are used since the adopted basis is opti- 
mized for ACDM cosmologies and less parameters have to be 
fitted: one coefficient, rather than three. The increased accuracy 
is particularly evident at lower redshifts, where the method im- 
provement takes fully advantage of the smaller measurement 
errors. 

As a second test case, we now consider a more challeng- 
ing data set in order to test the method's capability of capturing 
behaviours not explicitly described by the training-set models. 
In this example, we use the same training set of the previous 
case but we analyze luminosity distances resulting from a toy- 
model cosmology with a sudden transition in the expansion rate 
(see MB08 for details). In this case, the sample observational 
characteristics are mo delled after the proposed satellite SNAP 
(lAldering et al.ll2004l) . A ;t'^-analysis shows that this simulated 
data are compatible with a standard Friedmann ACDM cosmol- 
ogy. This is of course a misleading result since the background 
cosmology has a completely different nature and the sudden 
transition in H{a) is not highlighted. This demonstrates how 
a standard ;k'^-approach is not always capable of revealing the 
unexpected features hidden in the data set and how the result 
is bound to the theoretical prejudice. In contrast, with the pro- 
posed method this expansion rate transition is observed even if 
the training set does not contain any model with such a feature. 



simulated data 
PCA - 4 coeff 
MB08 - 6 coeff 





Fig. 4. Same as in Fig. [3] but for a SNAP like mock cata- 
logue based on a toy-model simulation with a sharp transition 
in the expansion rate. Even if the training-set was defined on 
Friedmann models without such a feature, the reconstruction 
was capable to highlight it. 



The reconstructed expansion rate for this example is shown in 
Fig.|4](blue error bars) together with the one obtained with the 
method as originally proposed by MB08 (shaded area). The ac- 
curacy improvement with respect to the original MB08 method 
may not appear striking, but we have to consider that this is a 
very extreme case where not even the best-fit Friedmann model 
is covered by the training set. This example just demonstrates 
the method's capability of capturing unexpected features even 
if optimized only for a specific set of cosmologies. 

4.3. Results on the SNLS data set 

The SuperNova Legacy Survey data set consists of 1 18 super- 
novae in the redshift range 0.015 < z < 1.01, 71 observed with 
the Can ada-France-Hawaii Telescope and 44 taken from the lit- 
erature jAstier et al. 2006h . We analyzed this sample with the 
same procedure applied to the ACDM simulation discussed in 



Sec. ( 14. 2t . The result sho wn in Fig.[5]i s fully compatible with 
the best ACDM model fit dAstier et al.ll2006l) . The accuracy is 



largely improved with respect to the original MB08 reconstruc- 
tion, giving the hope that future supernova samples may be able 
to reveal the dark-energy nature. 
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SNLS data 
PCA - 1 coeff 
MB08 - 3 coeff 
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Fig. 5. Same as in Fig. |3] but for the real SNLS data set. The 
reconstruction accuracy is largely improved and is fully com- 
patible with the best-fit ACDM model. 



5. Conclusions 

We defined an optimal basis system based on principal compo- 
nents to decompose cosmological observables. The principal 
components are defined starting not from the data but from an 
ensemble of given models. The basis system can be optimized 
for a specific cosmological model or for an ensemble of models 
even if based on different physical hypotheses. We suggest two 
main applications: (1) to define a cosmological parametrization 
applicable to any model independently of the physical back- 
ground and (2) to optimize the MB08 method for a direct esti- 
mate of the expansion rate from luminosity distance data. 

On one hand, the cosmological parametrization is based 
on the coefficients, i.e. the features, resulting from the observ- 
ables projection on the discussed basis system. The features are 
fully independent, avoiding the degeneracies and redundancies 
of physical parameters, and their number is minimal with re- 
spect to the data accuracy. Since they quantify observable prop- 
erties, they can be used as a common parametrization to de- 
scribe cosmologies independently of their background physics. 
However, they can be uniquely related to physical parameters 
once a model is specified. In addition, this parameterization al- 
lows to quantify the differences between different cosmologies 
in terms of a simple metric separation. 



On the other hand, the method proposed by MB08, which 
directly estimates the expansion rate from supernova data, re- 
quires to expand luminosity distances into a set of arbitrary or- 
thonormal functions. The use of the basis system derived in 
this work allows to largely reduce the resulting uncertainties 
and, even if the method is only optimized for a single or for 
an ensemble of cosmological models, it is still capable to de- 
tect unforeseen features not included in the algorithm setup as 
demonstrated. 
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